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(54) Device and method for voice activity detection 



(57) The invention relates to a device, a mobile ap- 
paratus incorporating the device, an accessory there- 
fore and a method for voice activity detection, particu- 
larly in a mobile telephone, using the directional sensi- 
tivity of a microphone system and exploiting the knowl- 
edge about the voice source's orientation in space. The 
device comprises a sound signal analyser arranged to 



determine whether a sound signal comprises speech. 
According to the invention, the device further comprises 
a microphone system (2a,2b,2c,2d t 2e) arranged to dis- 
criminate sounds emanating from sources located in dif- 
ferent directions from the microphone system, so that 
sounds only emanating from a range of directions are 
included as signals possibly containing speech. 
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Description 
Field of the invention 

[0001] The present invention relates to a device, a 
mobile apparatus incorporating the device, an accesso- 
ry therefor and a method for voice activity detection, par- 
ticularly in a mobile telephone, using the directional sen- 
sitivity of a microphone system and exploiting the knowl- 
edge about the voice source's orientation in space. The 
device assists the existing voice activity detection to 
achieve higher sensitivity and requiring less processor 
power. 

State of the art 

[0002] Voice activity detectors are used e.g. in mobile 
phones to enhance the performance in certain situa- 
tions. The most common way to construct a voice activ- 
ity detector is to look at the levels of the sub-bands of 
the incoming signal. Then the background noise level 
and the speech level are estimated and compared with 
a threshold to determine whether speech is present or 
not. An example of a voice activity detector is disclosed 
in U.S. patent 6,427,134. 

[0003] For instance in noisy environments it is hard to 
make a uniform parameter set-up for the voice activity 
detector. Therefore several voice activity detectors are 
needed, trimmed to the specific cases. For example in 
some modules you need to be sure that if there is 
speech it should be detected (echo canceller), but in oth- 
er cases it is better to indicate no speech if the signal to 
noise ratio level is too low. The plurality of voice activity 
detectors put a load on the digital signal processors that 
have to take care of performing the various voice activity 
detection algorithms. 



Summary of the invention 



[0004] An object of the present invention is to comple- 
ment existing voice activity detection taking into account 
the direction of the source of the sound. 

[0005] In a first aspect, the invention provides a de- 
vice for voice activity detection comprising a sound sig- 
nal analyser arranged to determine whether a sound 
signal comprises speech. 

[0006] According to the invention, the device further 
comprises 

a microphone system arranged to discriminate sounds 
emanating from sources located in different directions 
from the microphone system, so that sounds only ema- 
nating from a range of directions are included as signals 
possibly containing speech. 

[0007] Suitably, the range of directions is directed in 
the direction of an intended user's mouth. 

[0008] In one embodiment, the microphone system 
comprises two microphone elements separated a dis- 
tance and located on a line directed in the direction of 



an intended user's mouth. 

[0009] The range of directions may be defined as all 
sounds falling inside a cone with a cone angle a, where- 
in 10°<a<30°, and preferably, a is approximately 25°. 
s [0010] In another embodiment, the microphone sys- 
tem comprises three microphone elements separated a 
distance and located in a plane directed in the direction 
of an intended user's mouth. 

[0011] Suitably, two of said three microphone ele- 
io ments are separated a distance and located on a line 
directed perpendicular to the direction of an intended us- 
er's mouth. 

[0012] In another embodiment, the microphone sys- 
tem comprises four microphone elements located such 
15 that the fourth microphone is not located in the same 
plane as the three others. 

[0013] The microphone elements may be directional 
with a pattern having maximal sensitivity in the direction 
of an intended user's mouth. 

20 [0014] In still a further embodiment, the microphone 

system comprises one directional microphone element 
together with one or more other microphone elements 
to remove the uncertainty in the direction of the sound 
source. The directional microphone element may be 
25 used to measure the sound pressure level relative to the 
other microphone element. 

[0015] In a second aspect, the invention provides a 
mobile apparatus comprising a device as mentioned 
above. 

30 [0016] Suitably, the microphone elements are located 

at the lower edge of the apparatus. 

[0017] In one embodiment, a plurality of microphone 
elements are located at the lower edge of the apparatus 
and at least one further microphone element is located 
35 at a distance from the lower edge. 

[0018] The mobile apparatus may be a mobile radio 
terminal, e.g. a mobile telephone, a pager, a communi- 
cator, an electric organiser or a smartphone. 

[0019] in a third aspect, the invention provides an ac- 
40 cessory for a mobile apparatus comprising a micro- 
phone system as mentioned above. 

[0020] Suitably, the direction of the range of directions 
is adjustable. 

[0021 ] The accessory may be a hands-free kit or a tel- 
45 ephone conference microphone. 

[0022] In a fourth aspect, the invention provides a 
method for voice activity detection, including the steps 
of: 

so receiving sound signals from a microphone system 

arranged to discriminate sounds emanating from 
sources located in different directions from the mi- 
crophone system; 

determining the direction of the sound source caus- 
55 ing the sound signals; 

rf the sounds emanate from a first range of direc- 
tions, further analyse 

the sound to determine whether the sound signal 
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comprises speech; 

but it the sounds emanate from a second, different 
range of directions decide that the sound signal 
does not comprise speech. 

[0023] Suitably, the first range of directions is directed 
in the direction of an intended user's mouth. 

[0024] The first range of directions may be defined as 
all sounds falling inside cone with a cone angle a, 
wherein 10°<a<30°, and preferably a is approximately 
25°. 

[0025] In one embodiment, the microphone system 
comprises at least two microphone elements located at 
a distance from each other and located on a line directed 
in the direction of an intended user's mouth, said two 
microphone elements being separated a distance d, 
wherein the direction to the sound source 6 is calculated 
as 

_ A f-v 

6 = arccos 



where 

At is the time difference between the sounds from the 
two microphone elements, 
v is the velocity of sound. 

[0026] In another embodiment, one directional micro- 
phone element is used together with one or more other 
microphone elements to remove the uncertainty in the 
direction of the sound source. 

[0027] The directional microphone element may be 
used to measure the sound pressure level relative to the 
other microphone element. 

[0028] The invention is defined in the attached inde- 
pendent claims 1, 12, 16, and 20, while preferred em- 
bodiments are set forth in the dependent claims. 



Brief description of the drawings 



[0029] The invention will be described below in great- 
er detail with reference to the accompanying drawings, 
in which: 



fig. 1 is a perspective view of a mobile phone incor- 
porating the present invention, and 
fig. 2 is a schematic drawing of the receiving angle 
of an embodiment of the present invention. 



Detailed description of preferred embodiments 



[0030] As mentioned briefly in the introduction, many 
signal processing algorithms, such as echo cancellation 
and background noise synthesis, used in phones and 
hands-free kits are based on the fact that the user is 
speaking or not. For example the speech codec is active 
when the near-end user is speaking and the background 
synthesis is active when the near-end user is silent. All 
these algorithms need good voice activity detectors 



(VAD) to perform well. An error in the detection can re- 
sult in artefacts or malfunctions caused by divergence 
of the algorithms or other problems. 

[0031] Existing voice activity detectors are directed to 
5 determine whether speech is present or not in a sound 
signal. However, in fact not all speech is interesting or 
relevant, but only the user's speech. All other speech, 
e.g. in a noisy environment with several persons speak- 
ing, could be ignored and regarded as just noise. 
io [0032] The present inventor has realised that a micro- 
phone system having some kind of directional sensitivity 
could be used to discriminate sound emanating from dif- 
ferent sources located in different directions. Sound not 
emanating from the user can be declared as non- 
15 speech, and those signals do not have to be analysed 
with the conventional voice activity detectors. 

[0033] The existing voice activity detectors may be 
conventional and are only referred to as a sound signal 
analyser in this application. 

20 [0034] Generally, a microphone system having some 

kind of directional sensitivity can be used. Fig. 1 shows 
an example with at least two separate microphone ele- 
ments. 

[0035] A general mobile telephone is indicated at 1 . 
25 The invention is equally applicable to other devices such 
as mobile radio terminals, pagers, communicators, elec- 
tric organisers or smartphones. The common feature is 
that voice activity detection is employed, e.g. in connec- 
tion with communicating speech or receiving voice com- 
30 mands by means of speech recognition. 

[0036] In the simplest version, the microphone sys- 
tem comprises two microphones 2a and 2b. Suitably, 
they are located on a line directed in the calculated di- 
rection of an intended user's mouth. Suitably, the micro- 
35 phone elements are located at the lower edge of the mo- 
bile apparatus 1 . 

[0037] Fig. 2 shows a schematic diagram of the cal- 
culation of the direction of the sound source, typically 
the user's mouth 3. In the case of two microphones, only 
40 the angle to the line on which the microphone elements 
are located can be determined. In other words, the di- 
rection of the sound source is on a cone with a cone 
angle 6. To calculate the angle 6, first a cross-correlation 
between the two signals from the microphones 2a and 
45 2b is made. The maximum indicates the time difference 
At between the two microphones 2a and 2b. The dis- 
tance between the two microphones 2a and 2b is e.g. 
20 millimetres. The angle 6 is calculated as 



[0038] Note that arccos is only defined for arguments 
between — 1 and 1. If the time difference is negative, 
55 this means that the angle is greater than 90° and the 
sound emanates from behind the apparatus. 

[0039] Suitably, the device is adapted to determine 
that all sounds with an angle G less than a fixed angle a 



3 



□wcnmn' <cp 



l4flQSP)6A1 I > 




5 



EP 1 489 596 A1 



6 



are emanatingfrom the user. The threshold angle a may 
be set within a range of e.g. 10° to 30° , suitably at 25°. 
[0040] In the case of three microphones, the direction 
of the sound source can be further determined to be at 
two points (e.g. on the above cone). The three micro- 
phone elements are suitably located in a plane directed 
in the general direction of the user's mouth. In fig. 1 mi- 
crophone elements 2b, 2c and 2d are a possible set-up. 
The two microphone elements 2c and 2d at the front are 
located on a line perpendicular to the direction of the 
user's mouth, while the third microphone element 2b is 
located at the rear side. 

[0041] In the case of four microphones (or more) de- 
tection of all direction angles may be calculated, provid- 
ed that four microphone elements are located such that 
the fourth microphone is not located in the same plane 
as the three others, e.g. on a tetrahedron. A possible 
set-up is two microphone elements 2c and 2d at the front 
on the lower edge, while a third microphone element 2b 
is located at the rear side, and a fourth microphone el- 
ement 2e is located at the front at a distance from the 
lower edge. 

[0042] A similar microphone arrangement may be 
used in an accessory to a mobile apparatus, such as a 
hands-free kit or a telephone conference microphone 
system intended to be placed on a table. Apart from the 
microphone elements the logic circuitry may be located 
in the main/mobile apparatus. In this case the reception 
angle of the microphone system can be adjustable. This 
is useful e.g. when the microphone system is placed in 
a car, where the user can be seated either in the driver's 
seat or in the passenger's seat or even both the driver 
and the passenger may be speakers during the same 
call. The adjustment of the reception angle can be 
achieved mechanically or electronically, for example by 
beam forming or adaptation of the directional sensitivity 
of the microphone system. 

[0043] To further enhance the sensitivity of the micro- 
phone system, directional microphone elements with a 
pattern having a maximum sensitivity in the direction of 
the user's mouth could be used. 

[0044] In a further embodiment, one directional micro- 
phone element is used together with one or two other 
microphone elements (that may be non-directional). 
The directional microphone element is used to measure 
the sound pressure level relative to the other(s), thus 
removing the uncertainty in the direction of the sound 
source. Various combinations of directional microphone 
elements and non-directional microphone elements are 
possible. 

[0045] The present invention leads to a voice activity 
detector having enhanced performance. With the 
present invention only one voice activity detector may 
be necessary throughout the whole signal path . This will 
in turn reduce the computational complexity, decreasing 
the load on the digital signal processors as well as im- 
proving the performance. It is especially favourable in 
environments with high background noise and noise 



with similar spectral properties as speech. 

[0046] A person skilled in the art will realise that the 
invention may be realised with various combinations of 
hardware and software. The scope of the invention is 
5 only limited by the claims below. 



Claims 

io 1. A device for voice activity detection comprising a 
sound signal analyser arranged to determine 
whether a sound signal comprises speech, charac- 
terised by 

a microphone system (2a, 2b, 2c, 2d, 2e) arranged 

is to discriminate sounds emanating from sources lo- 
cated in different directions from the microphone 
system, so that sounds only emanating from a 
range of directions are included as signals possibly 
containing speech. 

20 

2. A device according to claim 1 , characterised in 
that the range of directions is directed in the direc- 
tion of an intended user's mouth (3). 

25 3. A device according to claim 2, characterised in 

that the microphone system comprises two micro- 
phone elements (2a, 2b) separated a distance and 
located on a line directed in the direction of an in- 
tended user's mouth (3). 

30 

4. A device according to claim 3, characterised in 
that the range of directions is defined as all sounds 
falling inside a cone with a cone angle a, wherein 
10°<a<30°. 

35 

5. A device according to claim 3, characterised in 
that a is approximately 25°. 

6. A device according to claim 2, characterised in 

40 that the microphone system comprises three micro- 
phone elements (2b, 2c, 2d) separated a distance 
and located in a plane directed in the direction of an 
intended user's mouth (3). 

45 7 . a device according to claim 6, characterised in 

that two (2c, 2d) of said three microphone elements 
are separated a distance and located on a line di- 
rected perpendicular to the direction of an intended 
user's mouth (3). 

50 

8. A device according to claim 2, characterised in 
that the microphone system comprises four micro- 
phone elements (2b, 2c, 2d, 2e), located such that 
the fourth microphone (2e) is not located in the 

55 same plane as the three others (2b, 2c, 2d). 

9. A device according to any one of claims 1 to 8 , char- 
acterised in that the microphone elements (2a, 2b, 
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2c, 2d, 2e) are directional with a pattern having 
maximal sensitivity in the direction ot an intended 
user's mouth (3). 

10. A device according to claim 1, characterised in 
that the microphone system comprises one direc- 
tional microphone element together with one or 
more other microphone elements adapted to re- 
move the uncertainty in the direction of the sound 
source. 

11. A device according to claims 10, characterised in 
that the directional microphone element is adapted 
to measure the sound pressure level relative to the 
other microphone element. 

12. A mobile apparatus, characterised in that it com- 
prises a device as defined in any one of claims 1 to 
11 . 

13. A mobile apparatus according to claim 12, charac- 
terised in that the microphone elements (2a, 2b, 
2c, 2d) are located at the lower edge of the appa- 
ratus. 

14. A mobile apparatus according to claim 12, charac- 
terised in that a plurality of microphone elements 
(2a, 2b, 2c, 2d) are located at the lower edge of the 
apparatus and at least one further microphone ele- 
ment (2e) is located at a distance from the lower 
edge. 

15. A mobile apparatus according to any one of claims 
12 to 14, characterised in that it is a mobile radio 
terminal, e.g. a mobile telephone (1), a pager, a 
communicator, an electric organiser or a smart- 
phone. 

16. An accessory for a mobile apparatus, character- 
ised in that it comprises a microphone system (2a, 
2b, 2c, 2d, 2e) as defined in any one of claims 1 to 
11 . 

1 7. An accessory according to claim 1 6, characterised 
in that the direction of the range of directions is ad- 
justable. 

18. An accessory according to claim 16 or 17, charac- 
terised in that it is a hands-free kit. 

19. An accessory according to claim 16 or 17, charac- 
terised in that it is a telephone conference micro- 
phone. 

20. A method for voice activity detection, character- 
ised by the steps of: 



system (2a, 2b, 2c, 2d, 2e) arranged to discrim- 
inate sounds emanating from sources located 
in different directions from the microphone sys- 
tem; 

5 determining the direction of the sound source 

causing the sound signals; 
if the sounds emanate from a first range of di- 
rections, further analyse the sound to deter- 
mine whether the sound signal comprises 
io speech; 

but if the sounds emanate from a second, dif- 
ferent range of directions decide that the sound 
signal does not comprise speech. 

15 21 . A method according to claim 20, characterised in 

that the first range of directions is directed in the 
direction of an Intended user's mouth (3). 

22. A method according to claims 21 , characterised in 
20 that the first range of directions is defined as all 

sounds falling inside a cone with a cone angle a, 
wherein 10°<a<30°. 

23. A method according to claims 22, characterised in 

25 that a is approximately 25°. 

24. A method according to any one of claims 22 or 23, 
characterised in that the microphone system com- 
prises at least two microphone elements (2a, 2b) 

30 located at a distance from each other and located 
on a line directed in the direction of an intended us- 
er's mouth (3), said two microphone elements being 
separated a distance d, wherein the direction to the 
sound source 6 is calculated as 
35 

~ A f*v 

6 = arccos ^ ^ 

where 

40 At is the time difference between the sounds from 
the two microphone elements, 
v is the velocity of sound. 

25. A method according to claims 20, characterised in 
45 that one directional microphone element is used to- 
gether with one or more other microphone elements 
to remove the uncertainty in the direction of the 
sound source. 

so 26. A method according to claims 25, characterised in 
that the directional microphone element is used to 
measure the sound pressure level relative to the 
other microphone element. 

55 
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